{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## <font color='green'> Regression (Dense)<font>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <font color='green'> 1. Description<font>\n",
    "\n",
    "Relative location of CT slices on axial axis data set using 53500 CT images from 74 different patients (43 male, 31 female).\n",
    "Dataset can be downloaded from https://archive.ics.uci.edu/ml/machine-learning-databases/00206/slice_localization_data.zip\n",
    "\n",
    "Each CT slice is described by two histograms in polar space. The first histogram describes the location of bone structures in the image, the second the location of air inclusions inside of the body.\n",
    "\n",
    "The class variable (relative location of an image on the axial axis) was constructed by manually annotating up to 10 different distinct landmarks in each CT Volume with known location. The location of slices in between landmarks was interpolated."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <font color='green'> 2. Data Preprocessing <font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import time\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "from collections import OrderedDict"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def preprocess_data(fname):\n",
    "    '''\n",
    "    For CT slices regression we will perform some data preparation and data cleaning steps.\n",
    "    '''\n",
    "    df = pd.read_csv(fname)\n",
    "    df = df.drop(columns=[\"patientId\"])\n",
    "    x = df.iloc[:,:-1]\n",
    "    y = df.iloc[:, -1]\n",
    "    from sklearn.model_selection import train_test_split\n",
    "    return train_test_split(x, y, test_size=0.20, random_state=42)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "shape of train data: (42800, 384)\n",
      "shape of test data: (10700, 384)\n"
     ]
    }
   ],
   "source": [
    "#---- Data Preparation ----\n",
    "# Please uncomment the below lines to download and unzip the dataset.\n",
    "#!wget -N https://archive.ics.uci.edu/ml/machine-learning-databases/00206/slice_localization_data.zip\n",
    "#!unzip -o slice_localization_data.zip\n",
    "#!mv slice_localization_data.csv datasets\n",
    "\n",
    "DATA_FILE = \"datasets/slice_localization_data.csv\"\n",
    "x_train, x_test, y_train, y_test = preprocess_data(DATA_FILE)\n",
    "print(\"shape of train data: {}\".format(x_train.shape))\n",
    "print(\"shape of test data: {}\".format(x_test.shape))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <font color='green'> 3. Algorithm Evaluation <font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_time = []\n",
    "test_time = []\n",
    "train_score = []\n",
    "test_score = []\n",
    "estimator_name = []"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "def evaluate(estimator, estimator_nm,\n",
    "             x_train, y_train,\n",
    "             x_test, y_test):\n",
    "    '''\n",
    "    To generate performance report for both frovedis and sklearn estimators\n",
    "    '''\n",
    "    estimator_name.append(estimator_nm)\n",
    "\n",
    "    start_time = time.time()\n",
    "    estimator.fit(x_train, y_train)\n",
    "    train_time.append(round(time.time() - start_time, 4))\n",
    "\n",
    "    start_time = time.time()\n",
    "    train_score.append(estimator.score(x_train, y_train))\n",
    "    test_score.append(estimator.score(x_test, y_test))\n",
    "    test_time.append(round(time.time() - start_time, 4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.1 LinearRegression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "TARGET = \"lnr\"\n",
    "import frovedis\n",
    "from frovedis.exrpc.server import FrovedisServer\n",
    "FrovedisServer.initialize(\"mpirun -np 8 \" +  os.environ[\"FROVEDIS_SERVER\"])\n",
    "from frovedis.mllib.linear_model import LinearRegression as fLNR\n",
    "f_est = fLNR()\n",
    "E_NM = TARGET + \"_frovedis_\" + frovedis.__version__\n",
    "evaluate(f_est, E_NM, x_train, y_train, x_test, y_test)\n",
    "f_est.release()\n",
    "FrovedisServer.shut_down()\n",
    "\n",
    "import sklearn\n",
    "from sklearn.linear_model import LinearRegression as sLNR\n",
    "s_est = sLNR()\n",
    "E_NM = TARGET + \"_sklearn_\" + sklearn.__version__\n",
    "evaluate(s_est, E_NM, x_train, y_train, x_test, y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.2 SGDRegressor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/adityaw/virt1/lib64/python3.6/site-packages/sklearn/linear_model/_stochastic_gradient.py:1223: ConvergenceWarning: Maximum number of iteration reached before convergence. Consider increasing max_iter to improve the fit.\n",
      "  ConvergenceWarning)\n"
     ]
    }
   ],
   "source": [
    "TARGET = \"sgd\"\n",
    "import frovedis\n",
    "from frovedis.exrpc.server import FrovedisServer\n",
    "FrovedisServer.initialize(\"mpirun -np 8 \" +  os.environ[\"FROVEDIS_SERVER\"])\n",
    "from frovedis.mllib.linear_model import SGDRegressor as fSGDReg\n",
    "f_est = fSGDReg(loss=\"squared_loss\", penalty=\"l2\",\n",
    "                eta0=0.00001)\n",
    "E_NM = TARGET + \"_frovedis_\" + frovedis.__version__\n",
    "evaluate(f_est, E_NM, x_train, y_train, x_test, y_test)\n",
    "f_est.release()\n",
    "FrovedisServer.shut_down()\n",
    "\n",
    "import sklearn\n",
    "from sklearn.linear_model import SGDRegressor as sSGDReg\n",
    "s_est = sSGDReg(loss=\"squared_loss\", penalty=\"l2\",\n",
    "                eta0=0.00001, random_state=42)\n",
    "E_NM = TARGET + \"_sklearn_\" + sklearn.__version__\n",
    "evaluate(s_est, E_NM, x_train, y_train, x_test, y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.3 KNNRegressor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "TARGET = \"knn\"\n",
    "import frovedis\n",
    "from frovedis.exrpc.server import FrovedisServer\n",
    "FrovedisServer.initialize(\"mpirun -np 8 \" +  os.environ[\"FROVEDIS_SERVER\"])\n",
    "from frovedis.mllib.neighbors import KNeighborsRegressor as fKNR\n",
    "f_est = fKNR(n_neighbors=1000, metric=\"euclidean\",\n",
    "             algorithm=\"brute\")\n",
    "E_NM = TARGET + \"_frovedis_\" + frovedis.__version__\n",
    "evaluate(f_est, E_NM, x_train, y_train, x_test, y_test)\n",
    "f_est.release()\n",
    "FrovedisServer.shut_down()\n",
    "\n",
    "import sklearn\n",
    "from sklearn.neighbors import KNeighborsRegressor as sKNR\n",
    "s_est = sKNR(n_neighbors=1000, metric=\"euclidean\",\n",
    "             algorithm=\"brute\")\n",
    "E_NM = TARGET + \"_sklearn_\" + sklearn.__version__\n",
    "evaluate(s_est, E_NM, x_train, y_train, x_test, y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.4 RidgeRegressor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "TARGET = \"ridge\"\n",
    "import frovedis\n",
    "from frovedis.exrpc.server import FrovedisServer\n",
    "FrovedisServer.initialize(\"mpirun -np 8 \" +  os.environ[\"FROVEDIS_SERVER\"])\n",
    "from frovedis.mllib.linear_model import Ridge as fRR\n",
    "f_est = fRR(alpha=0.001, lr_rate=1e-05, max_iter=200)\n",
    "E_NM = TARGET + \"_frovedis_\" + frovedis.__version__\n",
    "evaluate(f_est, E_NM, x_train, y_train, x_test, y_test)\n",
    "f_est.release()\n",
    "FrovedisServer.shut_down()\n",
    "\n",
    "import sklearn\n",
    "from sklearn.linear_model import Ridge as sRR\n",
    "s_est = sRR(alpha=0.001, solver='sag', max_iter=200)\n",
    "E_NM = TARGET + \"_sklearn_\" + sklearn.__version__\n",
    "evaluate(s_est, E_NM, x_train, y_train, x_test, y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.5 LinearSupportVectorRegressor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/adityaw/virt1/lib64/python3.6/site-packages/sklearn/svm/_base.py:258: ConvergenceWarning: Solver terminated early (max_iter=1000).  Consider pre-processing your data with StandardScaler or MinMaxScaler.\n",
      "  % self.max_iter, ConvergenceWarning)\n"
     ]
    }
   ],
   "source": [
    "TARGET = \"svr\"\n",
    "import frovedis\n",
    "from frovedis.exrpc.server import FrovedisServer\n",
    "FrovedisServer.initialize(\"mpirun -np 8 \" +  os.environ[\"FROVEDIS_SERVER\"])\n",
    "from frovedis.mllib.svm import LinearSVR as fLSVR\n",
    "f_est = fLSVR(lr_rate=0.0008)\n",
    "E_NM = TARGET + \"_frovedis_\" + frovedis.__version__\n",
    "evaluate(f_est, E_NM, x_train, y_train, x_test, y_test)\n",
    "f_est.release()\n",
    "FrovedisServer.shut_down()\n",
    "\n",
    "import sklearn\n",
    "from sklearn import svm as sSVMR\n",
    "s_est = sSVMR.SVR(epsilon=0.0, max_iter=1000, tol=0.0001)\n",
    "E_NM = TARGET + \"_sklearn_\" + sklearn.__version__\n",
    "evaluate(s_est, E_NM, x_train, y_train, x_test, y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.6 DecisionTreeRegressor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "TARGET = \"dtr\"\n",
    "import frovedis\n",
    "from frovedis.exrpc.server import FrovedisServer\n",
    "FrovedisServer.initialize(\"mpirun -np 8 \" +  os.environ[\"FROVEDIS_SERVER\"])\n",
    "from frovedis.mllib.tree import DecisionTreeRegressor as fDTR\n",
    "f_est = fDTR()\n",
    "E_NM = TARGET + \"_frovedis_\" + frovedis.__version__\n",
    "evaluate(f_est, E_NM, x_train, y_train, x_test, y_test)\n",
    "f_est.release()\n",
    "FrovedisServer.shut_down()\n",
    "\n",
    "import sklearn\n",
    "from sklearn.tree import DecisionTreeRegressor as sDTR\n",
    "s_est = sDTR(max_depth=5)\n",
    "E_NM = TARGET + \"_sklearn_\" + sklearn.__version__\n",
    "evaluate(s_est, E_NM, x_train, y_train, x_test, y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.7 LassoRegressor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "TARGET = \"lasso\"\n",
    "import frovedis\n",
    "from frovedis.exrpc.server import FrovedisServer\n",
    "FrovedisServer.initialize(\"mpirun -np 8 \" +  os.environ[\"FROVEDIS_SERVER\"])\n",
    "from frovedis.mllib.linear_model import Lasso as fLR\n",
    "f_est = fLR(alpha=0.001, lr_rate=1e-05, max_iter=800)\n",
    "E_NM = TARGET + \"_frovedis_\" + frovedis.__version__\n",
    "evaluate(f_est, E_NM, x_train, y_train, x_test, y_test)\n",
    "f_est.release()\n",
    "FrovedisServer.shut_down()\n",
    "\n",
    "import sklearn\n",
    "from sklearn.linear_model import Lasso as sLR\n",
    "s_est = sLR(alpha=0.001, max_iter=800)\n",
    "E_NM = TARGET + \"_sklearn_\" + sklearn.__version__\n",
    "evaluate(s_est, E_NM, x_train, y_train, x_test, y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### <font color='green'> 4. Performance Summary <font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                estimator  train time  test time  train-score  test-score\n",
      "0     lnr_frovedis_0.9.10      0.7077     0.2203     0.864830    0.861450\n",
      "1      lnr_sklearn_0.24.1      0.4939     0.1314     0.864834    0.861452\n",
      "2     sgd_frovedis_0.9.10      0.2518     0.2311     0.829700    0.829750\n",
      "3      sgd_sklearn_0.24.1     49.3236     0.0634     0.815368    0.814967\n",
      "4     knn_frovedis_0.9.10      0.2209     3.5279     0.821526    0.828232\n",
      "5      knn_sklearn_0.24.1      0.0204    43.3279     0.821526    0.828232\n",
      "6   ridge_frovedis_0.9.10      0.2197     0.2426     0.825246    0.825292\n",
      "7    ridge_sklearn_0.24.1      9.2877     0.0414     0.864844    0.861442\n",
      "8     svr_frovedis_0.9.10      0.5324     0.2557     0.776946    0.778071\n",
      "9      svr_sklearn_0.24.1     24.5304    32.9193     0.772782    0.770263\n",
      "10    dtr_frovedis_0.9.10      0.3235     0.2687     0.790788    0.786716\n",
      "11     dtr_sklearn_0.24.1      1.6639     0.0454     0.792260    0.786357\n",
      "12  lasso_frovedis_0.9.10      0.4499     0.3391     0.833455    0.832300\n",
      "13   lasso_sklearn_0.24.1     12.5618     0.0898     0.864686    0.861260\n"
     ]
    }
   ],
   "source": [
    "summary = pd.DataFrame(OrderedDict({ \"estimator\": estimator_name,\n",
    "                                     \"train time\": train_time,\n",
    "                                     \"test time\": test_time,\n",
    "                                     \"train-score\": train_score,\n",
    "                                     \"test-score\": test_score\n",
    "                                  }))\n",
    "print(summary)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
